Why Visualize?

Can we not rely only on statistical modeling?

Example: Anscombe Data (1)

x1 x2 x3 x4 y1 y2 y3 y4
10 10 10 8 8.04 9.14 7.46 6.58
8 8 8 8 6.95 8.14 6.77 5.76
13 13 13 8 7.58 8.74 12.74 7.71
9 9 9 8 8.81 8.77 7.11 8.84
11 11 11 8 8.33 9.26 7.81 8.47
14 14 14 8 9.96 8.10 8.84 7.04
6 6 6 8 7.24 6.13 6.08 5.25
4 4 4 19 4.26 3.10 5.39 12.50
12 12 12 8 10.84 9.13 8.15 5.56

Example: Anscombe Data (2)

  y1 y2 y3 y4
Predictors Est. Std. Err p Est. Std. Err p Est. Std. Err p Est. Std. Err p
(Intercept) 3.00 1.12 0.026 3.00 1.13 0.026 3.00 1.12 0.026 3.00 1.12 0.026
x1 0.50 0.12 0.002
x2 0.50 0.12 0.002
x3 0.50 0.12 0.002
x4 0.50 0.12 0.002
Observations 11 11 11 11
R2 / R2 adjusted 0.667 / 0.629 0.666 / 0.629 0.666 / 0.629 0.667 / 0.630

Example: Anscombe Data (3)

Learning objectives

This class will focus on two learning objectives:

  • Technical skills
  • Qualitative reasoning

Technical Skills

Developing foundational technical skills that allow us to

  • manipulate data (mainly dplyr, rtidy)
  • present it in static graphs (ggplot2)
  • and in (stand-alone) interactive graphs (plotly)
  • generate multiple views of interactive graphs (ggplot2/plotly + shiny)
  • combine all of the above in dashboards (shiny, RMarkdown’s flexdashboard)
  • Time permitting, we will learn some basic JavaScript coding

These materials will be delivered via lectures and you will be asked to submit some homework assignments.

Technical Skills 2

There are today MANY software packages and services for data visualization. The software we chose for this class is:

  • Free and open-source
  • Popular
  • Instructive as to general paradigms for generating interactive visualizations for the web.

Technical Skills 3

Interactive visualizations for the web can happen in one of two ways, client or server side.

Client side:

  • Generate standalone HTML files.
  • These contain all the relevant data for the visualization, and once loaded into a web browser all the calculations concerning interactivity are performed by the browser (using JavaScript).

Technical Skills 3

Client side:

  • plotly.js is a JavaScript package that makes interactive plots from within HTML files.
  • We will work with an R package named plotly that will translate R code, data and graphics to JavaScript for us.
  • There are similar interfaces for plotly.js for python and MATLAB.

Technical Skills 4

Server side:

  • Generate an HTML file that is loaded to our user’s browser.
  • However, the browser will send requests to an R-shiny server that will use the full capacities of R to perform computations behind the interactivity.
  • This approach is far more flexible and powerful in the results in can achieve, but a server is required for this to work.

Qualitative reasoning

  • What is the question your tools are set to help you find the answer to?
  • Which tools will you choose that best serve your goals?
  • How will you avoid misrepresenting data?

You will at times be asked to read Chapters from Cleveland’s “Visualizing Data” and Healy’s “Data Visualization: a practical introduction”. You will be asked to present several times throughout the semester and we will devote some time in class for in-depth feedback.

Example: Gapminder

An example of an interactive (and animated) plot is shown in the following:

Example: Gapminder

Given the data, all it took to generate the animated plot is the following piece of code:

gg <- 
  ggplot(gapminder, aes(gdpPercap, lifeExp, color = continent)) +
    geom_point(aes(size = pop, frame = year, ids = country)) +
    scale_x_log10() + 
    theme(legend.title = element_blank())

ggplotly(gg)

But is it worth anything? It all depends on the story you tell.

Example: Gapminder

There are ALWAYS A LOT of assumptions behind how any data generated, and what our visualizations actually represent.

  • What in the story that you about to hear is actually shown in the data?
  • What are claims or conjectures that are not necessarily supported by the data?
  • What may the story and presentation be glossing over?

Example: Gapminder

Life Expectancy

  • Life expectancy, estimate of the average number of additional years that a person of a given age can expect to live.
  • The most common measure of life expectancy is life expectancy at birth.

Life Expectancy

  • Life expectancy is a hypothetical measure. It assumes that the age-specific death rates for the year in question will apply throughout the lifetime of individuals born in that year. The estimate, in effect, projects the age-specific mortality (death) rates for a given period over the entire lifetime of the population born (or alive) during that time.
  • The measure differs considerably by sex, age, race, and geographic location. Therefore, life expectancy is commonly given for specific categories, rather than for the population in general.

Source - Encycopeadia Britannica

Note: the gapminder website does not mention its source for this measure.

GDP per Capita (PPP)

  • Gross Domestic Product (GDP) refers to the total monetary value of the goods and services produced within one country.
  • Nominal GDP calculates the monetary value in current, absolute terms.
  • Real GDP adjusts the nominal gross domestic product for inflation.
  • Adjusting GDP for the PPP value attempts to convert nominal GDP into a number that is easier to compare across countries with through a “basket of goods” approach.

GDP per Capita (PPP)

Note: the gapminder website mentions several sources for GDP per Capita estimates, but does not explain how they were spliced together.

See the following for

  • the deep conceptual difficulties in the notion of “real” GDP: The Flawed Metric
  • a short note about the difficulty of obtaining historical GDP estimates: Known Unknowns

Ambiguity Galore 1

US 2016 Presidential Election Results by County, choropleth
Source

Ambiguity Galore 1

US 2016 Presidential Election Results by County, dot

Source

Ambiguity Galore 2

Compare:

Ambiguity Galore 2

To:

Well intentioned, but silly 1

Well intentioned, but silly 2

Well intentioned, but silly 3

Sometimes, it isn’t “just” the plot

Source: Healy, Chapter 1

Sometimes, it isn’t “just” the plot (2)

Source: Healy, Chapter 1

Good EDA

Visualization process

Example of a simple dashboard

The University of Iowa’s Covid by state / county dashboard, which presents past and predicted figures for Covid-19 figures in the US:

https://covid19.stat.iastate.edu/

Example of a simple dashboard

The engines used in the UIowa dashboard are:

  • shiny (core of work done on server)
  • leaflet (free open-source JavaScript libraries for interactive maps for the web)
    • The core of the work is done in the leaflet.js (in the browser), the R package leaflet interfaces R and JavaScript.
    • The basemap is from mapbox.com (not open source, only free for limited use)
  • The time series are made in plotly:
    • The core of the work is done by plotly.js (in the browser), the R package plotly interfaces R and JavaScript

Example of a fancier shiny dashboard

That’s it!

Questions?